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LIKELIHOOD  RATIO  GRADIENT  ESTIMATION: 
AN  OVERVIEW 


Peter  VV.  Glynn 

Department  of  Industrial  Engineering 
University  of  Wisconsin 
Madison,  WI  53706 


ABSTRACT 

The  likelihood  ratio  method  for  gradient  estimation  is  brief¬ 
ly  surveyed.  Two  applications  settings  are  described,  namely 
Monte  Carlo  optimization  and  statistical  analysis  of  complex 
stochastic  systems.  Steady-state  gradient  estimation  is  empha¬ 
sized,  and  both  regenerative  and  non-regenerative  approaches 
are  given.  The  paper  also  indicates  how  these  me'hods  apply 
to  general  discrete-event  simulations;  the  idea  is  to  view  such 
systems  as  general  state  space  Markov  chains. 


ing  Monte  Carlo  gradient  estimation  is  for  the  purpose  of  opti¬ 
mizing  complex  stochastic  systems.  More  precisely,  consider  a 

stochastic  system  depending  on  d  decision  variables  9,,93 _ (V 

Let  a(9)  (9  =  (9,,..., 9.))  be  the  expected  “cost"  of  running  the 
system  at  parameter  choice  9. 

A  powerful  method  for  computing  the  value  9‘  which  min¬ 
imizes  a(  )  is  the  Robbins-Monro  algorithm.  This  technique 
recognizes  that,  under  suitable  regularity  on  a( ),  S'  must  be  a 
9-root  of  the  equation 


‘  ( 

1.  INTRODUCTION 


Va(9)  =  0, 


(2.1) 


\  >  v 

Consider  a  single-server  queuje  in  which  the  service  rate's  is 

a  decision  variable.  Given  that  +(9)  is  the  steady-state  cost  of 
running  the  queue  at  parameter  level  9,  one  is  frequently  inter¬ 
ested  in  minimizing  a(«)  over  a  suitable  constraint  set.  Since  a(  ) 
is  often  difficult  to  evaluate  analytically,  Monte  Carlo  optimiza¬ 
tion  is  an  attractive  methodology.  By  analogy  with  determinis¬ 
tic  mathematical  programming,  efficient  Monte  Carlo  gradient 
estimation  is  typically  an  important  ingredient  of  simulation 
based  optimization  algorithms.  As  a  consequence,  gradient  es¬ 
timation  has  recently  attracted  considerable  attention  in  the 
simulation  community.  It  Is"t»ur  goal,  in  this  paper,  to  describe 
one  such  method  for  estimating  gradients  in  the  Monte  Carlo 
setting,  namely  the  likelihood  ratio  method.  y  / _ 


where  Va(9)  is  the  gradient  of  a( )  evaluated  at  9.  The  idea  then 
is  to  construct  a  stochastic  recursion  which  has  the  root  9*  as 
its  limit  point. 

This  approach  is  most  clearly  illustrated  when  d  =  l.  In  this 
case,  such  a  recursion  is  given  by 

9„+t  =  - -V.H-i  (2.2) 

n 

(o  >  0)  where  the  V^’s  mimic  a'( )  in  expectation.  More  precisely, 
one  is  required  to  compute  V„’s  with  the  property  that 


In  Section  2,  we  describe  two  important  problems  which 
motivate  our  study  of  efficient  gradient  estimation  algorithms. 
Section  3  is  devoted  to  the  derivation  of  the  likelihood  ratio  gra¬ 
dient  estimate  for  transient  estimation  problems  in  a  discrete¬ 
time  Markov  chain  setting.  Section  4  extends  the  methodol¬ 
ogy  to  steady-state  gradient  estimation  by  using  regenerative 
structure;  in  Section  5,  a  technique  for  non-regenerative  sys¬ 
tems  is  explored.  Section  6  describes  the  specialization  of  these 
techniques  to  the  Markov  chains  associated  with  discrete-event 
simulations,  while  Section  7  states  some  conclusions. 

2.  EFFICIENT  GRADIENT  ESTIMATION:  MOTI¬ 
VATING  APPLICATIONS 

As  indicated  in  the  Introduction,  one  motivation  for  study- 


E{V„t,|V„,9o,-.-.F.,9„}  =  Q'(9„)  ...  (2.3) 

Under  appropriate  additional  hypotheses,  it  then  follows  that 
there  exists  finite  a  such  that 

9„  —  9*  ....  as  n  —  oo 

(2.4) 

„‘/Z(9„- 9')  =*ffAT(0,l) 

where  A(0, 1)  is  a  standard  normal  r.v.  The  key  result  in  (2.4) 
is  the  central  limit  theorem  which  asserts  that  9„  converges  to 
S'  at  rate  Op(n~'! ’).  (A  stochastic  sequence  {H„  :  n  >  0}  is  said 
to  be  0p(o„)  if  (a :  n  >  0)  is  tight.)  Since  a  convergence 
rate  Opln-1^)  is  typically  the  best  that  one  can  expect  of  a 
Monte  Carlo  algorithm  (because  of  central  limit  effects),  this 
suggests  that  recursive  algorithms  of  the  form  (2.2)  should  lead 
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to  reasonably  efficient  procedures  for  calculating  9*.  Of  course, 
the  critical  component  of  such  an  algorithm  is  the  sequence  of 
gradient  estimates  (derivative  estimates  when  i  =  l)  {V„  :  n  >  0} 
appearing  in  (2,3).  Thus,  efficient  stochastic  optimization  is 
one  setting  which  requires  gradient  estimation. 

A  second  problem  context  which  leads  naturally  to  gradi¬ 
ent  estimation  is  statistical  estimation  for  complex  stochastic 
systems.  As  an  example,  consider  a  single-server  infinite  ca¬ 
pacity  queue  in  which  the  inter-arrival  distribution  Fa  and  ser¬ 
vice  distribution  F,  are  unknown.  Suppose  that  one  is  given 
data  X,  for  the  inter-arrival  times  and  observations 

y1,...,ym  for  the  service  times,  with  the  goal  of  estimating  the 
steady-state  queue-length  a.  The  parameter  a  may  then  be 
regarded  as  a  function  of  the  inter-arrival  and  service  time  dis¬ 
tributions  i.e.  a  =  a{F.,F.).  If  F‘  and  F;  are  respectively  the 
“true”  inter-arrival  and  service  time  distributions,  our  goal  here 
is  to  estimate  a"  =  a{F;,F;)  from  the  data. 

Assume  that  F‘,F‘  are  elements  of  one  parameter  fam¬ 
ilies  of  distributions  { Ab (^i ) } ,  {/'. (S3) ) ,  respectively,  such  that 
F’  =  F„($;),F;  =  F.($;).  We  can  then  reduce  the  problem  of 
estimating  a"  to  that  of  determining  5(9;, 9;),  when  5(9,, 9,)  = 
“(Ta(9,),/;(92)).  For  example,  if  F„[>i)  and  F.{f2)  are  both  expo¬ 
nential,  the  resulting  system  is  an  M/M/1  queue  with  (5  can 
be  calculated  analytically  here) 


5,9.  9,1  =  !  <»»/*>)(»  -  (»•/»>»**.  *«<*» 

'  ( oc,  9,  >  9j. 

On  the  other  hand,  if  F„( )  and  F.(  )  are  uniform  on  [0,9,]  and 
0, 9jl  respectively,  5  is  not  available  in  closed  form,  and  Monte 
Carlo  evaluation  may  be  necessary. 

The  natural  estimate  for  a"  is  5  =  5(9, ,93),  where  9,  is  an 
estimate  for  9;  calculated  from  X,,...,X,  and  93  is  an  estimate 
for  9;  derived  from  V, ,  . . . ,  V*,;  o( )  is  a  Monte  Carlo  estimate  of 
5( ).  To  calculate  the  error  in  5  as  an  estimate  of  a-,  note  that 


5  -  a’  =  [5(9,, 9*3)  -  5(9*, ,92)| 

+  15(9*1,9*,)  -5(9), 9J)|. 


(2.5) 


The  first  term  on  the  right-hand  side  of  (2.5)  is  error  incurred 
from  the  Monte  Carlo  estimation  of  5(9,, 93);  the  second  term, 
which  is  (conditionally)  independent  of  the  first,  reflects  the 
intrinsic  error  in  n*  due  to  uncertainty  in  the  data  sets.  The 
error  in  the  first  term  can  be  estimated  from  conventional  out¬ 
put  analysis  procedures.  For  the  second,  note  that  if  a( )  is 
differentiable,  then 


5(9,, 9*,)  -5(9;. 9;)  a  V5(9-)(9*-9-). 


Typically,  the  vector  9  -  9‘  will  be  a  mean  zero  multivariate 
normal,  with  a  covariance  matrix  that  can  be  easily  estimated 
from  the  data  sets.  (This  occurs,  for  example,  if  the  9,’s  are 
maximum  likelihood  estimators  for  the  9* ’s . )  To  calculate  the 
distribution  of  the  second  term,  it  therefore  remains  to  compute 
V5(9")  or,  more  precisely,  its  estimator  V5(9).  For  analytically 
intractable  models  (such  as  the  single-server  infinite  capacity 
queue  with  uniform  inter-arrival  and  service  time  distributions), 
this  entails  calculating  a  gradient  via  Monte  Carlo  simulation. 

The  situation  described  above  in  the  single-server  queueing 
context  is  typical  of  many  statistical  problems  that  arise  in  the 
analysis  of  complex  stochastic  systems.  To  fully  resolve  the 
statistical  error  then  generally  requires  Monte  Carlo  estimation 
of  an  appropriate  gradient. 


3.  DERIVATION  OF  LIKELIHOOD  RATIOS  FOR 

MARKOV  CHAINS 

In  this  section,  we  derive  likelihood  ratio  gradient  estima¬ 
tors  for  discrete-time  Markov  chains.  Our  view  is  that  discrete- 
event  simulations  can  be  characterized  probabilistically  as 
discrete-time  Markov  chains.  In  particular,  suppose  that  one 
views  the  “state”  as  incorporating  all  that  information  about 
the  discrete-event  system  which  needs  to  be  computationally 
updated  on  every  transition  of  the  process  (e.g.  event  list,  clock 
times,  and  physical  state).  Thus,  one  can  view  a  computer  pro¬ 
gram  for  a  discrete-event  simulation  as  an  implementation  of 
the  recursion 


=  /„(*„,  rt.*,)  (3-D 

where  X„  is  the  “state”  of  the  system  at  the  n’th  transition,  and 
is  a  vector  incorporating  all  new  random  variables  which 
need  to  be  computed  in  order  to  calculate  A,.,,  from  X„.  The 
mappings  /„  are  complicated  functions  which  are  rarely  consid¬ 
ered  explicitly  by  the  simulator,  but  which  are  mathematical 
representations  of  the  computational  algorithm  used  to  obtain 
Xn+i  from  Xn  and  nn-ti-  We  will  return  to  this  point  in  Section 
6  when  we  consider  generalized  semi-Markov  processes.  In  any 
case,  any  sequence  (At,  :  n  >  o)  satisfying  (3.1)  is  Markov,  since 


|A0,...,A„)  =  P(/„(X„,n„<i)e  IX.)  =  <?„(*,,  ) 

where  <?„(i,  )  =  P{/„(x, )•  The  above  equality  follows  from 
the  fact  that  the  new  r.v.’s  which  are  generated  at  the  (n  *  i)'st 
transition  are  independent  of  everything  previously  generated. 
In  most  discrete-event  simulations,  the  transition  mechanism  is 
time-homogeneous,  so  that  /„  =  /  and  n„+i=n;  the  Markov  chain 


{X,  :  n  >  0}  is  then  itself  time-homogeneous  so  that  <3*  s  Q, 

Note  that  for  most  discrete-event  systems,  the  Markov 
chain  X  =  {Jf„  :  n  >  0}  defined  by  (3.1)  has  both  a  complicated 
state  space  and  complex  transition  rule.  To  simplify  our  ex¬ 
position  here,  we  therefore  start  by  considering  likelihood  ratio 
gradient  estimates  for  discrete  state  space  Markov  chains.  For 
each  t  in  some  open  set,  suppose  that  P{))  is  the  transition  ma¬ 
trix  associated  with  the  choice  5  of  the  parameter  value.  We 

further  assume  that  a  cost  j(0,to . «„)  is  incurred  when  the 

sample  sequence  (X0 . X„)  takes  on  the  values  (i0,. ..,«»).  In 

this  case,  the  expected  “cost"  of  running  the  chain  X  at  param¬ 
eter  value  S  takes  the  form 


fore  obtains  that  Va(«)  =  £,oVj(0).  Specifically,  one  has  the 
relation 


where 


§-{>)  =  f£(«,X . X.)L.{»)  +  s(t,  X0 . Xn) ±L„{t) 


and 


a(«)  =  £,»(«.  X0 . X„)  (3.2) 

where  E»( )  reflects  the  fact  that  the  probabilistic  dynamics  of 
X  are  governed  by  F(9). 

If  Eil  )  were  independent  of  t,  our  solution  to  the  Monte 
Carlo  gradient  estimation  problem  would  be  trivial,  namely  to 
simulate  i.i.d.  replicates  of  the  random  vector  Vj(«,  Af0,- •  •  ,X>)- 
The  trick  is  therefore  to  transform  (3.2)  into  a  representation 
where  the  expectation  operator  is  independent  of  S.  To  accom¬ 
plish  this,  observe  that 


l*lt,X o) 


n-1 

£ 


ap(9,x,,x,^) 

at. 


Pie,x„x„t) 


Thus,  by  simulating  X0 . X„  under  initial  distribution  *,(»,,) 

and  transition  matrix  P(0O),  we  can  calculate  dg(6)/dt,  and  there¬ 
by  estimate  Va (S).  Observe  that  the  estimator  Vj(0)/d«,  con¬ 
tains  the  product  terms 


l  m  =  17  PI*'*'-*'")  M(g.-Vo) 

&Wo.X,,XM)  M(to,Xo)- 


n-  1 

'(<)  •  *0.  to)  fX  P{t,'k, *k  +  l) 


=  £  . >»)Ln(t,‘o . <nM4>,<o)  17  P(>0,‘H,’k+l) 

♦  o  •  •  •  »*  k  =  0 

(3.3) 


We  claim  that  the  choice  So  =  S'  is  particularly  convenient 
for  evaluating  Vo(S').  In  this  case,  L„(t')  =  1  so  that  the  com¬ 
putation  involved  in  calculating  the  estimator  Vj(0')  is  reduced. 
Furthermore,  for  large  «,  this  choice  substantially  reduces  the 
variance  of  VJ(S').  To  see  this,  note  that  £»„  £„(«')  =  l,  so  that 


where 


in(S,«o,  ■  ■  ■  ,I„) 


t»(S,»o)  tt  FIS.h.h^i) 
W^o.'o)  Plto,*k,*kfl) 


(3.4) 


We  assume  here  implicitly  (as  throughout  this  paper)  that  ap¬ 
propriate  positivity  conditions  are  in  force  so  as  to  guarantee 
that  no  divisions  by  zero  occur  in  (3.4). 


Returning  to  (3.4),  we  can  easily  verify  that 


a(»)  =  E',f(t,X . . *„)£„(<)  =  E,,m  (3.5) 


v»r*0i„(S')  =  E,„Ll{«‘)  -  1. 


Assuming  that  P(J0)  is  positive  recurrent  with  stationary  prob¬ 
abilities  *(f0). 


ilogt’(f)  =  ]Tlog 


P^t'.X^Xn.,) 
P3{S  o,Xk,Xt  +  l) 


+  i  bg  4^4 

n  M2(^o,A'o) 


»(So,>)F(^o.<,; 


f>2(<’1.',;)  ) 

P3(«o.>,;)J 


=  (£> 


where 


Hence, 


i„(S)  =  £„(«,* o . X„) 


-E„c  log  Ll(t')~P 
n 


The  crucial  point  in  (3.5)  is  that  the  expectation  operator  ap¬ 
pearing  on  the  right-hand  side  is  independent  of  t.  One  there-  so  that,  by  Jensen's  inequality, 


..w-MWgWWWWM  w»  i 
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>  «*p(nvs/2) 

for  n  sufficiently  large.  Since  £<0LJ(«')  >  (£,„I„(0’))3  =  l,  it  fol¬ 
lows  that  *>  >  0  (with  strict  inequality  holding  when  I„(0')  is 
non-deterministic).  We  conclude  that  if  0O  #  0',  the  variance  of 
Ln (S')  generally  grows  exponentially  fast  in  n.  One  would  expect 
this  exponential  growth  to  significantly  impact  the  variance  of 
Vj(g')  for  large  n. 

We  turn  now  to  the  generalization  of  this  approach  to 
Markov  chains  having  a  general  state  space;  this  generaliza¬ 
tion  is  necessary  in  order  to  apply  this  methodology  to  Markov 
chains  of  the  type  arising  in  discrete-event  simulation.  The  ana¬ 
log  of  the  initial  distribution  vector  p(0)  is  an  initial  probability 
distribution 


p(0,/t)  =  P,{Xo'A) 

whereas  the  transition  matrix  P(S)  is  replaced  by  the  transition 
kernel 


P(e,z,A)  =  P'{Xn„iA\Xn  =  x) 

We  require  that  *,(S),  P(0)  have  densities,  in  the  sense  that 


m(0,4)  =  J 

P(I,*,A)  =  / 


y)^(dv) 

g(0,  *,»)£(*,<* v). 


(3.6) 


for  some  (o-finite)  measures  <,,P(x,  ).  It  is  easily  verified  that 


o(0)  =  £»g(0,X _ _ X„) 

=  E,'9lS,X0,....JC„)L„(f) 


where 


L  m  =  n  p(0. 

'  «(«o.AT0) 

and  £,( )  is  the  expectation  operator  corresponding  to  the 
probability  measure  P,{X0cdz0,  ...,X„e3x„)  =  p(0,<ix 0) 
nr.o  P(6.*k,4*k-n)-  As  in  the  discrete  state  space  case,  choos¬ 
ing  0O  =  *'  maxes  sense  in  evaluating  Vo(0')  via  Monte  Carlo 
simulation.  In  this  case,  we  obtain  Va(0’)  =  EtcVg(9'),  where 


f£(0)  =  f£(0',  X„, . . . ,  X„)  +  g(0',  Xo, .  . ,  Xn)£-L„(t')  (3.7) 


and 


Si*,,,,,  _  i±(e'.Xo)/de,  ,  V'1  aP(0'tjfJ.x,*1)/30. 

«■  «(»■•*•!  h  WX;x„x\ 


(3.8) 


Thus,  under  the  density  hypothesis  (3.6),  it  is  straightfor¬ 
ward  to  calculate  an  unbiased  estimator  for  Va(0'): 

1.  Generate  X0,...,X„  under  p(0')  and  P(f'). 

2.  Calculate  the  r.v.’s  3j(0)/80,  and  3L„(0')/30,  from  (3.6) 

and  (3.7)  and  the  sample  path  X0 . X„  generated  in 

1. 

By  replicating  steps  1  and  2,  one  can  easily  construct  an  es¬ 
timator  (just  use  the  sample  mean)  which  converges  to  Vo(0') 
at  rate  Op(i~VJ)  (use  the  multivariate  centra'  limit  theorem)  in 
the  computational  effort  I.  We  have  therefore  obtained  a  gradi¬ 
ent  estimator  which  converges  at  the  best  possible  Monte  Carlo 
convergence  rate,  namely  Op (*_ */2). 

Variants  of  the  gradient  estimator  algorithm  described 
above  have  been  analyzed  in  Glynn  (1986a),  Reiman  and  Weiss 
(1986),  and  Rubinstein  (1986). 


4.  STEADY-STATE  GRADIENT  ESTIMATORS: 

REGENERATIVE  ANALYSIS 

The  method  outlined  in  Section  3  was  valid  for  cost  func¬ 
tionals  g(0,Xo,...,X„)  which  depend  on  the  chain  X  up  to  a  de¬ 
terministic  finite  time  horizon  n.  In  fact,  the  method  is  equally 
valid  for  functionals  g(0,Xo,..  •  ,Xrl  depending  on  the  chain  up 
to  a  stopping  time  T.  To  be  precise,  suppose  that 


o(0)  =  £,9(0,XO . XT), 

where  £s( )  is  the  expectation  on  the  path-space  of  X  =  (X„  : 
n  >  o)  corresponding  to  initial  distribution  «(0)  and  transition 
kernel  P(0).  Then,  Vo(0')  =  £0^9(0')  where 


■  ■  Xr ) 


+  g(0',X„,....Xr) 


3u(0',Xr.)/38, 
v[0',  Xo) 


y-  3r(0',X;.Xj-i)/3<, 

4r,  p(f’,x,,x,^) 


An  alternative  estimator  can  be  developed  when  j(0,Xo,. . .  ,Xr) 
is  an  additive  functional  of  the  form 


T 

9(0,Xo,...,XT)  =  ^8(0,X,). 

,  =  0 

In  this  case,  we  can  use  the  fact  that 
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We  can  try  to  approximate  Va(«)  via  fcY  V  j„(«'),  where 


aJ^p-=x-^aK[i\xi)iae, 

*  n  k-0 


. 


.  1  V'1,,.,  f  3u(0\X„)/3*,  ,  V.‘3p(r*„X,+,)/3*. 

» £  ‘  •  *'  p^o)  i  "if — 


By  conditioning  in  (4.1)  and  using  relations  (4.2)  and  (4.3),  we 
find  that 


£*•-^-5(0  = -EVA(M') 


The  first  sum  in  (4.6)  satisfies  a  strong  law,  and  therefore 
converges.  The  second  quantity  on  the  right-hand  side  is  a 
product  of  two  factors,  the  first  of  which  satisfies  a  strong  law 
(4.4)  with  limit  a(tf').  The  second  factor,  which  involves  a  sum  of 
terms  of  the  form  |£(4', X, ,X, +  t)/p(S',  X„X,i.t),  can  be  analyzed 
via  the  central  limit  theorem  (use  (4.3)),  yielding 


mm')  =  E^-ms',*,)  +  Em*'.*,) 


»',/J  E 


dp{i',X,,X,+  >)/96. 


rV'.X^X,^) 


■  <rN{  0,1) 


f^aptt'.Xe.Xt+^/ae,  du(s',x0)/ai, ) 
[%,  +  »(»'.  *«)  J 


for  some  constant  a.  By  squaring  both  sides  of  (4.7)  and  taking 
expectations,  we  find  that 


J=0  j*  0  rx  ,  ;  <=j  +  l 


v„..  J  y  ap(«*.x,.x,*»)/a«, 
lytS  P^.X.-Xy  +  l) 


u(J',*o) 


This  suggests  that 


Relations  (4.3)  and  (4.5)  will  prove  useful  in  our  regenerative 
analysis  of  steady-state  gradient  estimation. 


....  ®  g„(0')  2  _ 

*’  ~ai - 17  ■  “l* )  n 


as  n  —  oo.  We  conclude  that  we  can  expect  the  variance  of 
Consider  a  family  of  transition  kernels  P(0)  having  unique  dj„[$')/dSi  to  increase  linearly  with  n.  Thus,  in  trying  to  ap- 

stationary  distributions  w(0),  and  suppose  that  we  wish  to  cal-  proximate  a  steady-state  gradient,  the  approximants  become 

culate  the  gradient  of  increasingly  less  stable.  This  conclusion,  which  was  previously 

observed  by  Reiman  and  Weiss  (1986),  leads  one  to  look  for 
r  alternative  approaches. 


r  alternative  approacnes. 

a(0)  =  I  w(9,  dx)h{6,  x). 

One  way  to  do  this  is  to  assume  that  the  sequence  X  = 
Of  course,  a(0)  may  generally  be  regarded  as  the  expected  cost,  { x„  :  «  >  0}  possesses  readily  identifiable  regenerative  structure. 


under  P9{ ),  of  the  functional 


1  ll”  * 

?(*,*o,...)  =  ummr-EM*.*»)- 


In  this  case,  assuming  that  p  is  a  regenerative  initial  condition 
with  T  its  associated  regeneration  time,  the  ratio  formula  of 
regenerative  analysis  shows  that 


Note  that 


g.EL~oM*.*>)  _  »(«) 


a{0)  w  lim  inf  E«gn{9,  Xo, . . . ,  *n-i) 


fc(».X0 . *„-.)  =  ;EM»,*.) 


=  E  “’(*.**)/ E>T 
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where  =  R(i,x)  -  a(6').  It  remains  to  evaluate  the  above  ALGORITHM  B: 

partial  derivative  in  terms  of  a  quantity  amenable  to  Monte 

1.  Choose  a  san 

Carlo  estimation.  ,  . 


By  applying  (4.1),  we  find  that 


jrE,^2w[S,Xk)  -  52 


£  dp{e/*’’\+i)!^Tf/w,,X')  -  <>(«')} 

'  Dio  f  Xj  ,  + 


Hence,  Va(it')  can  be  estimated  by  using  the  following  algo- 


ALGORITHM  A: 


1.  Choose  a  sample  size  n  >  l,  where  n  corresponds  to  the 
number  of  regenerative  cycles  to  be  simulated. 

2.  Generate  a  sample  path  Jf0, ...  (i.e.  generate  X  over  a 

regenerative  cycle)  under  PIS’). 

3.  Calculate  the  quantities: 

«u  =  T 
r-i 

*u  =  £  W.Xj) 

3=0 
T-l  g 

^>  =  T,jfh{e''Xi) 

3=0  ‘ 


rJ4  =  £ 


dp($',x„x,+1)/d6, 

££  p(«\X„X,^) 


(T  -  1  -  ;) 


**-  f  *%%*?**  £  *(»'.*«) 

p!o.a„a,+1) 


1.  Choose  a  sample  size  n  >  1,  where  n  corresponds  to  the  4  Replicate  steps  2  and  3  n  times,  thereby  obtaining  R,,,  l  < 

number  of  regenerative  cycles  to  be  simulated.  «  <  n  1  <  j  <  5. 

2.  Generate  a  sample  path  Afo . Aj--,  (i.e.  generate  X  over  a  5  Calculate 

regenerative  cycle)  under  /"(«') . 

3.  Calculate  the  quantities: 


Qn=T 

r-i 

«.=  =  £m«',^) 

3  =  0 


0.4  =  £ 

j  =  0 


£  h((',Xt) 

%  p(9.x„x,^) 

4.  Replicate  steps  2  and  3  n  times,  thereby  obtaining  Q,,,l  < 
1  <  n,  1  £  ;  <  5. 

5.  Calculate 


3, a,  (S')  = 


<?:.(")  +  <?s(n)  _  Q;(n)  (?,(n) 

<3lM  +  <3lW  <Jl(n)  0l(n)  ' 


where  <J,(n)  =  <?„/n(l  <  «  <  5);  this  estimator  converges 

to  da[B,)/d$x  as  n  — *  oo. 

A  second  regenerative  estimator  for  Va(S')  uses  (4.3)  and  (4.5). 
It  is  easily  shown  that 


3,03  (S')  = 


■ft3(n)  Rj(n)  _  ^(n)  fl4(n) 
R,(n)  J?,(n)  fl,(n)  R,(n) 


where  A,(n)  =  R>./"(1  <  t  <  5);  this  estimator  converges  to 

3q(S')/3S,  as  n  — *  00. 

It  is  easily  verified,  via  standard  arguments,  that  the  estima¬ 
tors  described  in  Algorithms  A  and  B  converge  at  rate  Op(t- 1/2) 
in  the  computational  effort  t. 


5.  NON-REGENERATIVE  STEADY-STATE  GRA¬ 
DIENT  ESTIMATORS 

We  turn  now  to  the  case  where  the  sequence  X  =  {X„  :  n  >  0} 
exhibits  no  obvious  regenerative  structure.  The  regenerative 
results  of  Section  4  actually  provide  the  key  to  the  analysis. 

Turning  to  (4.8),  we  note  that  the  second  sum  can  be  ex¬ 
pressed  as  a  steady-state  expectation  i.e. 


E,T  %  p(9,X„X,.,) 


This  gives  rise  to  a  second  algorithm  for  estimating  Va(p') 


where  E»-  is  the  expectation  on  the  path-space  of  X  associated 
with  transition  kernel  P(0f)  and  initial  distribution  *(£  )•  ^or 
the  first  term,  a  more  intricate  analysis  is  necessar\. 

Let  rkX  =  (A*,*Vm.  ■  )■  h°r  a  function  /  defined  on  the 
(48)  infinite  product  space,  an  easy  extension  of  the  regenerative 
ratio  formula  proves  that 
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£  n^X)  =  Erf(X) 

jaO 

Applying  this  formula  to 


„  y.  a  l(T>l) 

/(Af)  ~  as/1*  '*0,Xl)  p(«',x0,Xi) 


we  obtain  the  relation 

W.a(,,) 

=  E,.±.h(9\Xo) 


+  *"• t  (M**.. X.)  -  •(•*)>- 


Let  T, ,  r2 ,  be  the  successive  regeneration  times  for  X.  By  the 
ratio  formula  for  regenerative  processes 

£rz*v:lw.x,) 

Er(Tk+,-Tt) 

and  hence 


Tk  +  1 ”  1 

£,.  V  (*|IU,|-«(('|)  =  0. 

i-T* 


9,  )  =  j  P(i,  x,  )<r(9,di), 


(52) 


where  S  is  the  state  space  of  X.  Then,  it  is  reasonable  to  assume 
that  PI)  can  be  expanded  as 


P{6‘  +  he.)  =  P{»')  +  fi<3,(9’)  +  o(h) 


(S3) 


where  e,  is  the  t’th  unit  vector.  If  *(0’  +  he,)  is  (formally)  differ¬ 
entiable  at  h  =  0,  there  exists  a  measure  rj,  (S')  such  that 


*(6'  +  hc%)  =  *(0')  ■+■  +  o(/i). 


(5.4) 


Plugging  (5.4)  and  (5.3)  into  the  stationarity  equation  (5.2) 
and  collecting  terms  in  h  yields 


•?.(»'.  ■)-  f  T).[B',dz)P{r,x,  ) 

=  J  *[»',dx)Q, ((',«,  )• 


In  operator  form,  this  can  be  written  as 


*(»')(/- P(»'))  =  *(9')Q.(9'). 


(5.5) 


By  the  independence  of  regenerative  cycles,  we  get 


Mp'.X„) 

+  A.^(»'1X..Xl)  ^IWM,)-a(f)) 

+  XI  *o.  Xi)  ^  ^  ^  > 


p{8> ,  Xo.  X 


Let  n  — *  oo  and  we  obtain 


3  f  3  v  ,  *  V  E  ±4*  X  X  1  (M<'.X,)-o(9')_) 

—»(«)  =  £,.  —h(t , Xo)  +  ^ £.■  p(*  . Xo,  x.)  „(9,iJf0iJri) 


-  £r  +  S  d«. p(s'' Xo'  P(9', Xo,  x~ 


(5.1) 


We  wish  to  solve  for  ij,(9').  Let  ri(0',i,  )  =  *(0',  )  for  all  zcS. 
Observe  that  (5.4)  implies  that  r,.(9',S)  =  0  (just  divide  by  h  and 
let  5  —  0),  and  hence 


J  !).(»', 4i)n(0',x,  )  =  0 

Substituting  (5.6)  into  (5.5)  provides 


(5.6) 


((4.3)  was  used  in  the  last  equality).  The  important  point  is 
that  expression  (5.1),  while  derived  from  a  regenerative  argu¬ 
ment,  is  independent  of  regenerative  structure. 

The  same  expression  can  be  found  via  a  totally  different 
argument.  Recall  that  the  stationary  distribution  *(0)  satisfies 


n.(9')(/-P(«')  +  n(0'))  =  »(0')<J.(«'). 

Now,  for  many  Markov  chains  (in  particular,  aperiodic  positive 
recurrent  Harris  chains),  P*(0',x,  )  —  *(9\  )  for  all  xeS,  and  it 
therefore  makes  sense  to  assume  that 


(/  -  Pin  +  n(»'))-1  =  /  +  Dp*Ij')  -  n(*'» 

fc=  l 

exists.  (Just  use  the  identity  P(9')n(0')  =  n(0')  =  n(9')P(0').) 
Thus, 


(5.7) 


(use  (5.6)  again).  Recall  that 
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h——a[9')  •»  *[9'  +  —  ir{0')/i(0'] 

d6% 

«/»,.(«>(*') +  »(«')^  *(•*)* 


the  infinite  sum  (5.1).  This  approach  appeared  previously  in 
(5.8)  Glynn  (1986b). 


(Expand  and  collect  terms  in  h  again.)  Substituting  (5.7)  into 
(5.8),  we  get 


-  I  W'Z]  (5  9) 

+  I|f  *(»'.<<*)  f  Q.V'.x.dy)  f  P*(«',y,i,)5(9',«). 
k?oJs  Js  Js 

We  now  identify  Qt{9',xtdy}  in  our  current  framework.  Note  that 
(see(3.5)) 


PfP'  +  Ae,,  *,  A)  =  P(6',z , 


— ^'.sr.v) 


P  (£’,  £■  dy) 
p(*',*.y) 


so 


<?.(*'.  *.<<y) 


z,y)/de, 

?(«'.*. y) 


P(»',*,dy). 


(5  10) 


Substituting  into  (5.9)  yields  (5.1).  Formula  (5.1)  is  the  funda¬ 
mental  relation  for  non-regenerative  stochastic  systems.  Notice 
that  the  first  term  on  the  right-hand  side  of  (5.1)  can  be  con¬ 
sistently  estimated  via 


(5.11) 


whereas  the  j’th  term  in  the  infinite  sum  appearing  there  can 
be  estimated  by  using 


6.  LIKELIHOOD  RATIO  GRADIENT  ESTIMA¬ 
TION  FOR  DISCRETE)- EVENT  SIMULATIONS 

As  indicated  in  Section  3,  discrete-event  simulations  can  be 
viewed  as  Markov  chains  living  on  a  general  state  space.  To 
be  precise,  discrete-event  simulations  can  be  viewed  mathemat¬ 
ically  as  a  “generalized  semi-Markov  process’  (GSMP).  Such  a 
process  is  characterized  by: 

S:  a  “physical”  state  space  which  is  countable  (e  g  5 
might  be  the  set  of  all  possible  queue-length  vectors  for 
a  queueing  simulation) . 

E:  a  set  of  events  to  be  scheduled  (e  g.  for  each  station 
in  a  closed  queueing  network,  one  needs  to  schedule  an 
“end  of  service”  event). 

p(«',j,t):  the  probability  of  jumping  from  j  to  j',  given 
that  event  e  triggers  the  transition  from  j  (e.g.  r  might 
correspond  to  station  i  completing  service,  in  which  case 
p(s';  >,  e)  might  represent  the  probability  of  sending  a  cus¬ 
tomer  from  station  i  to  station  ;;  here  = 

r„:  the  rate  at  which  clock  t  runs  down  to  zero  in  state 
s  (e  g.  in  a  queueing  network,  r„  might  be  unity  except 
for  events  e  which  are  “interrupted”  in  state  s,  in  which 
case  r„  =  o). 

F(  \s',t', «,«):  the  probability  distribution  which  sched¬ 
ules  event  s'  in  state  s',  given  that  the  previous  state 
was  i  and  the  transition  was  triggered  by  r  (e  g.  these 
might  be  service  time  distributions  in  a  closed  queueing 
network). 


1  V"  S  if  Y  V  1 

-1*  xi  xT 


p(£\  A*,  Xk*  i ) 


(512) 


A  standard  device  for  estimating  the  entire  infinite  sum  is  to 
use  an  estimator  which  combines  (5.11)  and  (5.12),  namely  to 
use 


In  calculating  gradients,  we  allow  p(j’;  s,c)  and  F(  .s',  r'.s.i)  to  de¬ 
pend  on  the  decision  parameter  f ;  the  likelihood  ratio  method  is 
generally  inapplicable  to  problems  in  which  5,  E,  or  r„  depends 
on  S.  We  further  require  that  /■(£;  ,  s’.c'.j.r)  have  a  density  for 
which  the  support  is  independent  of  6,  so  that 


tin  f  . 

£  ■sr*rx>.x» 

'  1  fc«0 


p(£',  A'*,  Aj,-i) 


where  C(n)  is  keyed  to  the  sample  size  n  in  such  a  way  that 
<(n)  —  oo  with  f(n)/n  —  0.  The  particular  choice  of  <(n)  effects 
a  compromise  between  bias  and  variance  effects  in  estimating 


f(£,dz;/,e\r,e)  =  /(£,  z;  s',  t,  s.  r)p(  di) 

where  (z  :  )\fi,z,s', »',«,«)  >  0)  is  independent  of  £.  (Thic  is  the 
analogue  of  the  positivity  condition  discussed  in  Section  3.) 
This  density  hypothesis  rules  out  point  mass  distributions  in 
which  P  controls  the  location  of  the  points:  the  independent 
support  condition  does  not  permit  uniform  distributions  with 
support  on  |0, £!. 
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To  make  a  discrete-event  simulation  Markov,  we  consider 
the  state  of  the  simulation  at  transition  epochs.  Specifically, 
set  XH  =  (S„,C„),  where  S„  is  the  “physical”  state  occupied  at 
transition  n  and  C„  is  the  state  of  the  “clocks”  on  the  event 
scheduling  list  at  the  n’th  transition.  Then,  (X„  :  n  >  0}  is 
a  Markov  chain  with  a  complicated  state  space  (inclusion  of 
the  clocks  makes  the  state  space  uncountable).  To  study  the 
ergodic  behavior  of  a  GSMP  { K (t)  :  t  >  0),  note  that 


p(«',sJ+1,s,,*;) 


The  algorithm  discussed  in  Sections  4  and  5  can  then  be  ap¬ 
plied  to  general  discrete-event  simulations,  by  substituting  (6.2) 
appropriately. 


7  /><’ 


Evid 

fc * 0  Vfc 


for  t  large,  where  Ck  is  the  time  spent  in  the  *’th  state  visited, 
and  .V (t)  is  the  number  of  transitions  by  time  t.  (Note  that  C; 
is  a  simple  function  of  C*,  namely  the  minimal  value  of  C\,/rs>< 
taken  over  all  clocks  e.)  If  the  GSMP  is  well  behaved,  we  can 
expect  that 


-  /  a(K(s))dj  -  £,.a(S„)C0-/£,.C(I  P.a.j 
i  Jo 

as  t  —  oo.  The  objective  of  calculating  steady-state  gradients 
for  GSMP's  therefore  reduces  to  estimating  the  gradients  of 
£*  a(S0)CJ  and  E,  C This  can  be  done  by  the  methods  of  Sec¬ 
tions  4  and  5  (apply  to  h(Xk)  =  o(5»)C;  and  h(Xkj  =  C;).  It 
remains  only  to  identify  the  analogue  of 


-p()',X,,X,  *,)/>(»’.  Xj.X,.,) 


7.  CONCLUSION 

We  have  shown  that  gradient  estimation  plays  an  important 
role  in  the  optimization  of  stochast:c  systems,  as  well  as  in 
their  statistical  analysis.  The  likelihood  ratio  method  described 
here  is  easily  applied  to  discrete-event  simulations  of  arbitrary 
complexity  (see  Section  6),  and  does  not  require  case-by-case 
analysis  for  implementation.  On  the  other  hand,  this  method  is 
inapplicable  to  problems  in  which  the  settings  of  deterministic 
event  times  are  decision  variables.  (See  the  density  conditions  in 
Section  6.)  Such  problems  frequently  arise  in  a  manufacturing 
context.  Nevertheless,  we  believe  that  the  methods  described 
here  form  a  promising  avenue  for  future  research. 
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for  this  particular  class  of  Markov  chains  in  which  X,  =  (S,,C,). 
Note  that  under  parameter  !,  X,.,  =  is  obtained 

from  X ,  =  [S,,C, )  by: 

a. )  making  a  state  transition  from  5,  to  S,kl  with  a  proba¬ 

bility  pfS.Sj.i.Sj.eJ),  where  e]  is  the  event  that  triggered 
the  transition  from  S,. 

b. )  certain  clocks  belonging  to  the  (random)  set  0,kl  con¬ 

tinue  to  be  scheduled  in  5,,,  and  run  down  determinis¬ 
tically  there. 

c. )  the  remaining  events  eeN,*.,  active  in  S,. ,  are  scheduled 

according  to  the  distributions  F(«,  and  set 

to  new  values  C,.i ,. 

The  analogue  of  (6.1)  can  be  easily  verified  to  be 


REFERENCES 

Glynn,  P.  W.  (1986a).  Stochastic  approximation  for  Monte 
Carlo  optimization.  In:  Proceedings  of  the  1986  Winter  Simu¬ 
lation  Conference,  p.  356-365. 

Glynn,  P.  W.  (1986b).  Sensitivity  analysis  for  stationary  proba¬ 
bilities  of  a  Markov  chain.  In:  Proceedings  of  the  Fourth  Army 
Conference  on  Applied  Mathematics  and  Computing. 

Reiman,  M.  I.  and  Weiss,  A.  (1986).  Sensitivity  analysis  via 
likelihood  ratios.  In:  Proceedings  of  the  1986  tVmtrr  Simula¬ 
tion  Conference,  p.  285-289. 

Rubinstein,  R.  Y.  (1981).  Sensitivity  analysis  and  performance 
extrapolation  for  computer  simulation  models,  Technical 
Report,  Harvard  University,  Cambridge,  MA. 


AUTHOR’S  BIOGRAPHY 

PETER  W.  GLYNN  is  an  associate  professor  in  the  Depart¬ 
ment  of  Industrial  Engineering  at  the  University  of  Wisconsin- 
Madison.  He  received  a  B.Sc.  in  mathematics  from  Carleton 
University  in  1978  and  a  Ph  D.  in  operations  research  from 
Stanford  University  in  1982.  His  current  research  interests  in- 


Likelihood  Ratio  Gradient  Estimation:  An  Overview 


elude  Monte  Carlo  simulation,  computational  probability,  and 
queueing  theory.  He  is  a  member  of  ORSA,  TIMS,  the  IMS, 
and  the  Statistical  Society  of  Canada. 

Peter  W.  Glynn 

Department  of  Industrial  Engineering 
University  of  Wisconsin-Madison 
1513  University  Avenue 
Madison,  WI  53706 
608/263-6790 


